15.08.2024
15.08.2024
Researchers funded by the SNSF are expected to share their datasets in public repositories. A first look shows that many researchers are not regularly reporting their datasets to the SNSF, but most of those provided follow FAIR principles.
Since the introduction of its Open Research Data (ORD) policy in 2017, the SNSF requires the submission of a Data Management Plan (DMP) with most of its funding schemes. Data produced by all funded research is expected to be deposited in public repositories and to follow the FAIR data sharing principles.
The aim of a Data Management Plan (DMP) is to define the intended life cycle of the research data produced over the course of a grant. It offers a long-term perspective by outlining how data will be generated, collected, documented, shared and preserved. The SNSF provides a template to help researchers complete their Data Management Plans. Details can be found under the DMP guidelines for researchers.
The FAIR principles represent a set of guiding principles to make research datasets Findable, Accessible, Interoperable, and Reusable. The SNSF requires data to be reusable without restriction, provided there are no legal, ethical, copyright or other issues. Open Research Data and the FAIR principles are valued by the SNSF as they contribute to the impact, transparency and reproducibility of research. Details can be found on the SNSF Open Research Data page. To make the transition towards FAIR research data easier, the SNSF decided to define a set of minimum criteria that repositories must fulfil to conform with the FAIR Data Principles.
The percentage of completed SNSF grants that have declared at least one dataset to the SNSF as part of their output data (see the information box on output data collection) is continuously increasing across all SNSF funding schemes and research domains1.
Grants in Mathematics, Informatics, Natural sciences, and Technology (MINT) show the largest increase (+ 26 percentage points since 2017/18). Life sciences (LS) also showed an increase in declared datasets since 2017/18 (+ 17 percentage points). In the Social sciences and humanities (SSH), the number of declared datasets increased between 2017/18 and 2021 (+ 9 percentage points), but has flattened since then (+ 2 percentage points between 2021 and 2023). In SSH, some disciplines deal with sensitive data and often have longer publication cycles, especially in the social sciences.
Researchers who carried out SNSF-funded grants that ended in 2023 were required to deliver a DMP before they started. Many of the DMPs included an intent to publish datasets on FAIR and often open repositories (see also the SNSF’s first report on its open research data compliance). Our analysis shows that only 23% of these grants (363 out of the 1548 grants completed in 2023) declared at least one dataset. On average, each of these grants with datasets shared 3.7 datasets, amounting to a total of 1344 declared datasets.
Putting ORD into context with open access publishing (OA), we note that most completed SNSF grants declare several scientific publications that were mostly open access. Often, such publications rely on datasets that should be declared as research output. This raises the question of why the ORD share stands at only 23%. The reasons for this low percentage are varied and difficult to fully identify:
The current state of affairs shows that the SNSF needs to continue to raise awareness of this topic. One step the SNSF is taking is the implementation of this ORD monitoring, which will be conducted regularly in the future. By openly monitoring and publishing these results, we aim to highlight the importance of good ORD practices.
Comparing our situation internationally, the observation that a low share of grants declaring at least one dataset is consistent with the study conducted by the publisher PLOS. The study reported that about 28% of PLOS research articles were linked to a dataset shared in a repository, versus 15% of other publicly available research articles from PubMed Central. The results are further consistent with the European Research Data Landscape survey, which found that 22% of respondents stored data in research data repositories during their current/most recent research activity. The fact that ORD percentages are at a similar stage at other organizations shows that the low share of datasets declared to the SNSF can also be explained on a structural level.
The current monitoring result is the reflection of a systematic issue: ORD is not yet as firmly anchored in academia as is OA. However, the numbers indicate that there is a development towards more ORD practices being followed. With its ORD policy, the SNSF supports this development and sets an example for more transparency in the academic system.
As shown in the next figure, when sharing datasets, researchers choose hosting solutions that in most cases follow FAIR principles. Nevertheless, FAIR sharing is not synonymous with open sharing. Sometimes this is a result of legitimate data protection regulations, but not always. A first analysis indicates that only about half of the declared datasets could be identified as open, while it was unclear for the other half (see the “How are output data collected for SNSF grants?” box at the end of the article).
Since 2017, Zenodo has become increasingly popular. Only four years later, it was the repository of choice for 40% of the declared datasets. Except for a few repositories (mainly Zenodo and ETH research collections), the use of repositories is fragmented, with preferences depending on the research domain (Open Science Framework and SwissUbase for SSH, and Gene Expression Omnibus for LS). This fragmentation reflects the great variety of data generated in the diverse grants funded by the SNSF.
The trend to declare and share datasets resulting from SNSF grants on repositories complying with the FAIR principles is increasing. This points to a growing consciousness that research output goes beyond scientific articles and that (meta)data sharing provides important and valuable information. Nevertheless, while a majority of scientific publications resulting from SNSF grants are open access, there is significant room for improvements when it comes to publishing and declaring datasets. The current scientific reward system is still focussed too heavily on the publication of scientific articles without the underlying datasets. With the national ORD strategy and the underlying action plan, the SNSF and its partners contribute to this shift towards Open Science practices and to the recognition of datasets as important research output.
Since 2011, grantees have been asked to report to the SNSF output produced from their research (the “Dataset” category was added in 2018). Grantees can enter output data at any time: during or after the completion of the grants. They are reminded to report output data when they submit a scientific report (annual, mid-term, or final report) and 1.5 years after the end of a grant.
The data used in this story are from the “Output data: Datasets” available in the Datasets section of the SNSF Data Portal and we considered grants from all funding schemes (except Science Communication and Infrastructure).
To compute the rate of grants with datasets, we considered grants that ended between October 2017 and December 2023. For the last two figures, the “Output data: Datasets” data were collected mid-March 2023 and we considered grants that ended between October 2017 and December 2022.
The data were manually curated to check the FAIRness of the repositories according to SNSF guidelines. It is worth mentioning that this FAIRness evolves over time and may not reflect the current compliance of data repositories analysed in this study to the SNSF ORD criteria.
Grantees are required to publish datasets supporting the research published in scientific publications resulting from SNSF grants. Data should be publicly accessible provided there are no legal, ethical, copyright or other issues. The openness of a dataset identified with a DOI was defined based on metadata provided by DataCite A dataset was considered open if the metadata indicated the dataset was open or associated with a public license, or had one of the following licenses:
For datasets without metadata on openness or associated license, their openness status was considered to be unknown.
Data, text and code of this data story are available on Github and archived on Zenodo.
DOI: 10.46446/datastory.open-research-data-2023
Infrastructure and science communication grants are excluded from this analysis.↩︎